Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP feat: new asm parser #1064

Closed
wants to merge 3 commits into from
Closed

WIP feat: new asm parser #1064

wants to merge 3 commits into from

Conversation

novusnota
Copy link
Member

@novusnota novusnota commented Nov 21, 2024

I don't know if this should be rewritten before or after the non-Ohm parser. That said, I'll start describing examples of TVM instructions and arguments that they expect, such that we can come up with a number of primitives to express those arguments.

The things people are used to are:

  • 42 -42 — decimal number literals, with in-between underscores _ allowed for readabilty (underscores _ aren't allowed in Fift, but may be allowed in Tact)
  • x{babecafe_} — hexadecimal bitstrings (up to 1023 bits) with optinal padding via the _ at the end.
  • b{010101} — binary bitstrings (up to 1023 bits) without padding.
  • "string literals" — primarily for use in debug-related instructions, so we might remove those
  • s0, s1, ..., s15 — stack registers
  • c0, c1, ..., c15 — control registers
  • MYCODE ADDRSHIFT#MOD IF: XCHG3_l 2SWAP -ROT — TVM instructions themselves. By the way, the ones ending with _l are useless and just legacy; most of those starting with a number have an alias where they end with that number (2DROP or DROP2, etc.); and IF: and related control flow-ish ones are used with continuations. Don't confuse the latter with IF:<{, which are purely Fift-guided and aren't TVM instructions per se.

The new-ish syntax proposals:

  • c{...}, a 1-to-1 replacement of Fift's B{...} B>boc, which are hex-encoded BoCs with no limit on bitsize, except for the account state limits of course. Quite important for PUSHREF and similar instructions.
  • Extra stack registers: instead of deprecated i s() syntax, users now can use any of s0, s1, ..., up to s255

The important bit is to parse deprecated things or recognize instuctions inside strings "" to provide clear error messages for users, with simple migration from the old syntax of Fift to new one of Tact.


This is another attempt at making the asm parser — now we're deliberately limiting ourselves to TVM instructions. In the current iteration it's still possible to write code in a very limited subset of Fift, but only if we won't restrict the list of supported TVM instructions in grammar.ts, i.e. during semantical actions on the lexed tokens.

TODO:

  • check that I'm going in the right direction before proceeding further (@anton-trunov)
  • grammar.ohm
  • ast.ts
  • grammar.ts
  • prettyPrinter.tswriteFunction.ts (to actually have asm instructions written)
  • hash.ts
  • compare.ts

Issue

Towards #837.
Towards #1030 (once the grammar.ts is written, there will be a small check for such gotchas).

Checklist

  • I have updated CHANGELOG.md
  • I have added tests to demonstrate the contribution is correctly implemented: this usually includes both positive and negative tests, showing the happy path(s) and featuring intentionally broken cases
  • I have run all the tests locally and no test failure was reported
  • I have run the linter, formatter and spellchecker
  • I did not do unrelated and/or undiscussed refactorings

@novusnota novusnota added this to the v1.6.0 milestone Nov 21, 2024
@anton-trunov
Copy link
Member

Let's start with concrete examples of asm-functions we would like to write and then try to develop some grammar for it. As of now, we should start with instruction sequences and comments (let's use the Tact/Fift syntax for comments).

@novusnota
Copy link
Member Author

we should start with instruction sequences and comments

i.e. no WHILE:<{ … }>DO<{ … }> loops and no similar constructs for now, right?

@anton-trunov
Copy link
Member

i.e. no WHILE:<{ … }>DO<{ … }> loops and no similar constructs for now, right?

definitely not using this syntax :)

@anton-trunov
Copy link
Member

we might allow asm-blocks to avoid introducing high-level constructions into the assembly language, so the user might able to insert those into loops, etc.

@novusnota
Copy link
Member Author

novusnota commented Nov 21, 2024

Hmm, then, something like this we've showed earlier (and have in our tests) won't work anymore:

asm fun isIntAnInt(x: Int): Bool {
    <{
        TRY:<{
            0 PUSHINT ADD DROP -1 PUSHINT
        }>CATCH<{
            2DROP 0 PUSHINT
        }>
    }>CONT 1 1 CALLXARGS
}

Proceed with removing that? This kinda goes against POLA.

UPD: Edited the original post, saying that the full range of Fift+TVM syntax there was highly experimental and subject to change in next releases. Ok, now we can proceed.

@anton-trunov
Copy link
Member

we haven't documented it, so there is no backwards compatibility violation

@anton-trunov
Copy link
Member

and Tact Kitchen posts are about raw immature things, sneak peek

@novusnota novusnota force-pushed the closes-837-new-asm-parser branch from 82a65b1 to f1562ff Compare November 22, 2024 10:55
@novusnota novusnota force-pushed the closes-837-new-asm-parser branch from e52f1ec to 40f118b Compare November 22, 2024 18:30
@novusnota novusnota mentioned this pull request Nov 24, 2024
3 tasks
]);

// NOTE: ok, I might need some help here...
export const ppAstAsmExpressionList: Printer<A.AstAsmExpressionList> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.braced(node.exprs.map(ppAstAsmExpression))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the order PRs get merged, you might not have this problem here at all :)

Copy link
Member Author

@novusnota novusnota Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but I've already removed the asm expression lists from the grammar, such that { ... } are prohibited. At least until we may or may not make sense of them ourselves and without Fift's aid.

Didn't remove it there in pretty printer yet, but will.

// 1. Instructions cannot contain braces
// 2. Cannot be braces themselves
// 3. Cannot contain lowercase letters, except for the very last position
asmInstruction (TVM instruction) = ~(("{" | "}") ~asmWord) asmWord
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately all newline characters get parsed as spaces and are completely erased from AST. There is no way to pretty-print this at all.

I can imagine only one way to make this work without adding ; or some other explicit way to terminate lines: put this into a separate grammar with its own spaces rule.

Copy link
Member Author

@novusnota novusnota Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm thinking of the Prettier's approach with trying to fit as much as allowed in 80, 120 or other column limits, and put all excesses on new lines. Or, to simply put a newline character after each instruction — to keep all prior primitives required for it on the same line.

The second option sounds best from legibility perspective as well:

asm fun showcase(a: Int, b: Int, c: Int): Int {
    s0 s1 s2 XCHG3
    s1 s2 s0 XCHG3
    DROP2
}

Copy link
Member Author

@novusnota novusnota Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and it would be nice to also check if the instructions are written on the same line with the opening and closing braces { ... }, i.e. when the start row of the function body equals its end row. Because if that's the case, we should respect the author's intent and inline everything there: add spaces after instructions, not newlines.

@novusnota
Copy link
Member Author

novusnota commented Dec 26, 2024

Closing this in favor of: #1258. We should discuss the resulting assembly syntax a bit more — for example, besides #1064 (comment), we might need the <{ ... }> (creates a Builder at comptime) to allow expressing anything beyond the top-level declarations (0 DECLPROC name_name_name) directly in TVM instructions and not as large hex-encoded bitstrings.

For example, when working on assembly functions page in the docs, I've replaced this:

<{ }>CONT

with

x{} SLICE BLESS,

where the former is a comptime-resolved expression by Fift, and the latter is exclusively processed by TVM — SLICE is an alias to PUSHSLICE instruction with one argument, and BLESS converts the Slice from the stack to the continuation.

Now, you could imagine that the x{...} could've been much larger and it could even be replaced with B{...} B>boc to overcome the limits of 128 bytes of the former, but it would be nigh-impossible to read and understand which instructions are encoded with them, so some macros like <{ ... }> (with Builder { ... } as an alternative syntax, for example) are needed.

@novusnota novusnota closed this Dec 26, 2024
@novusnota novusnota deleted the closes-837-new-asm-parser branch December 30, 2024 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants